Combined commodity mining method based on knowledge graph rule embedding

ABSTRACT

The present invention is a combined commodity mining method based on knowledge graph rule embedding, comprising: expressing rules, commodities, attributes, and attribute values as embeddings; splicing and inputting the embeddings of the rules and the embeddings of the attributes into a first neural network to obtain a importance scores of the attributes; splicing and inputting the rules and attributes into a second neural network to obtain the embeddings of the attribute values that the rules should take under the attributes; calculating a similarity between the value of two inputted commodities under the attribute and the embedding of the attribute value calculated by a model; after calculating scores of all attribute-attribute value pairs, summing up to obtain scores of these two commodities under this rule; then making the cross entropy loss with the real scores of these two commodities, and iteratively training based on an optimization algorithm having gradient descent; after the model is trained, parsing the embeddings of the rules in a similar way to obtain rules that can be understood by human beings.

FIELD OF TECHNOLOGY

The present invention relates to the field of knowledge graph rules, inparticular to a combined commodity mining method based on knowledgegraph rule embedding.

BACKGROUND TECHNOLOGY

In knowledge graph, triples (head, relation, tail) are used to representknowledge. We can represent this knowledge with one-hot vectors. Butthere are too many entities and relations, and the dimensions are toolarge. The one-hot vectors cannot capture similarities when two entitiesor relations are very close. Inspired by a Wrod2Vec model, many methodsfor representing entities and relations with distributed representations(KGE) have been proposed in academic community, such as TransE, TransH,TransR and so on. A basic idea of these models is that by learning agraph structure, the head, relation and tail can be represented by alow-dimensional dense vector. For example, the TransE is to make a sumof the head vector and the relation vector as close as possible to thetail vector. In the TransE, a triple is scored as:

f_(r) (h, t) = //h + r − t//_(L₁/L₂)

For a correct triple(h, r, t)∈Δ^(′), there should be a lower score,while for a wrong triple(h′, r′, t) ∈Δ^(′), there should be a higherscore, and a final loss function is:

$L = {\sum\limits_{{({h,r,t})} \in \Delta}{\quad{\sum\limits_{{({h^{\prime},r^{\prime},t^{\prime}})} \in \Delta}{\quad\max\left( {f_{r}\left( {h,t} \right) - f_{r}\left( {h',t'} \right) + \gamma,0} \right)}}\quad}}$

The knowledge graph only has the correct triplet (golden triplet), so anegative instance can be generated by destroying the head entity or tailentity of a correct triplet, that is, one of the head entity, tailentity, and relation is randomly replaced with other entity or relation,resulting in a set of negative instancesΔ . By continuously optimizingthis loss function, the representations of h, r, t can be finallylearned.

In the field of e-commerce, likewise, there is also a commodityknowledge graph. In the commodity knowledge graph, the head entityrefers to a commodity, the relation refers to a commodity attribute, andthe tail entity refers to an attribute value of the commodity.Therefore, embeddings of the commodity, commodity attribute andcommodity attribute value can be learned through the KGE method, andthen used in a downstream task.

In the field of the e-commerce, merchants sometimes need to bind andsell several commodities. On the one hand, a total price of the severalcommodities is generally lower than a sum of selling prices of allsingle commodities, so that profit is given to users and they would bemore motivated to buy. On the other hand, a seller can make more profitby selling several commodities at the same time than by selling onecommodity. Therefore, there is a great demand for combined commoditysales in practical applications, which requires a method that canautomatically help sellers combine several commodities that can be soldtogether.

However, the KGE-based method has the disadvantage that although it canpredict whether two commodities belong to a combination, the seller doesnot know why the two commodities are combined, so it is necessary toprovide interpretability for this combination. Based on this, it isurgent to design a method so that the sellers can intuitively know whytwo commodities can be sold together.

SUMMARY OF THE INVENTION

The present invention provides a combined commodity mining method basedon knowledge graph rule embeddings. By expressing the combined commodityrules as embeddings, and then parsing the learned rule embeddings intospecific rules, it can help merchants to construct combined commoditiesthat can be sold together.

A combined commodity mining method based on knowledge graph ruleembeddings, comprising:

-   (1) constructing a knowledge graph of commodities, wherein for each    ternary group data in the knowledge graph, a head entity is a    commodity I, a relation is a commodity attribute P, a tail entity is    a commodity attribute value V;-   (2) expressing the commodity I, commodity attribute P, and commodity    attribute value V as embeddings, respectively, and randomly    initializing the embeddings of a plurality of rules;-   (3) splicing and inputting the embedding of each rule and the    embedding of each commodity attribute into a first neural network to    obtain an importance score s₁ of the commodity attribute;-   (4) splicing and inputting the embedding of each rule and the    embedding of each commodity attribute into a second neural network    to obtain the embedding of the attribute value that the rule should    obtain under that attribute: V_(pred);-   (5) splicing and inputting the embedding of each rule and the    embedding of each commodity attribute into a third neural network,    and calculating a probability score p of the same attribute value of    a certain attribute under a certain rule;-   (6) if attribute values of two commodities under a certain attribute    are different, calculating a similarity score S₂₁ of V_(pred) and    V₁, and a similarity score S₂₂ of V_(pred) and V₂; and if the    attribute values of two commodities under the certain attribute are    the same, calculating a similarity score S₂ of the V_(pred) and    V_(true);-   wherein, V₁ represents an embedding of an attribute value of one of    the two commodities under the attribute, V₂ is an embedding of an    attribute value of another commodity under the attribute, V_(ture)    is an embedding of the same attribute value;-   (7) when an importance score s₁ of a certain attribute is greater    than a threshold thres₁, and attribute values of the two commodities    are the same under the certain attribute, summing up to obtain a    score_(ij) of this attribute-attribute value pair as s₁×(p+(1-p    )×S₂); when the importance score s₁ of a certain attribute is    greater than the thres₁, and the attribute values of the two    commodities under the certain attribute are different, summing up to    obtain the score_(ij) of this attribute-attribute value pair as 0.5    × s₁ × (s₂₁+ s₂₂); when the importance score s₁ of a certain    attribute is less than or equal to the thres₁, taking the score of    this attribute-attribute value pair as 0;-   (8) summing up the scores “score_(ij)” of m attribute-attribute    value pairs of a commodity pair to obtain score_(i):-   $\text{score}_{\text{i}} = {\sum_{j = 1}^{m}\text{score}_{\text{ij}}}$-   (9) summing up the scores” score_(i)” of the commodity pair under n    rules, and obtaining a final score “score” of the commodity pair:-   $\text{score} = \left( {\sum_{i = 1}^{n}{score_{\text{i}}}} \right)\text{/n}$-   (10) comparing the obtained score of the commodity pair with two    labels 0 or 1 indicating whether belonging to a combined commodity    to obtain a cross entropy loss; iteratively solving based on an    optimization algorithm having gradient descent until a loss value    converges and parameters of the three neural networks are trained,    and obtaining the embeddings that have learned the rules at the same    time; and-   (11) for the embeddings that have learned the rules, utilizing the    trained neural network for analysis to obtain the rules of commodity    combination.

In step (1), the composition of each triple in the commodity knowledgegraph is (I, P, V), which represents that the attribute value of thecommodity I under the attribute P is V. Different commodities areassociated with the same attribute or attribute value, thus forming thestructure of the graph.

In step (2), the commodity I, the commodity attribute P, the commodityattribute value V and a plurality of rules are respectively numbered asids, and then each id constitutes a one-hot vector, and then the one-hotvector is mapped into an embedding, which is continuously optimized witha model training process.

In steps (3) to (5), in the three neural networks, a calculation formulaof an activation function of each layer of neurons is:

$RELU(x) = max\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {0,x} \right)$

wherein the function of RELU judges the value of each element in thismatrix in turn, and if the value of the element is greater than 0, thenthe value is kept, otherwise the value is set to be 0.

In the three neural networks, a calculation formula of each layer ofeach neural network is:

$l_{1} = RELU\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {W_{1}concat\left( {r_{i,}\, p_{j}} \right)} \right)$

$l_{2} = RELU\begin{array}{l}{f\left( {} \right)}\end{array}\left( {W_{2}l_{1} + b_{1}} \right)$

$l_{3} = RELU\begin{array}{l}{f\left( {} \right)}\end{array}\left( {W_{3}l_{2} + b_{2}} \right)$

$l_{L} = sigmoid\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {W_{L}l_{L - 1} + b_{L - 1}} \right)$

wherein, W₁ W₂,..., W_(L); b₁ b₂ _(,)...,b_(L) are all parameters thatneed to be learned, W₁, W₂ , W₃, ..., W_(L) are matrices having sizesdim_(emb)*dim₁, dim₁*dim₂, dim₂*dim₃,...,dim_(L-1)*dim_(L) respectivelyand being random initialized; b₁,b₂,...,b_(L)is a randomly initializedvector of size dim₁, dim₂, dim₃,...,dim_(L), L is the number of layersof the neural network; a nonlinear activation functionsigmoid (z)=

$\frac{1}{1 + e^{\text{-z}}}\,\,,$

the output value is limited to (0 , 1) interval.

In step (6), the similarity scores s₂₁, s₂₂ and s₂ are all calculated bycosine similarity, and the specific formulas are:

$\text{s}_{21} = \cos\_\text{sim\_1}\mspace{6mu}\text{=}\begin{matrix}{V_{pred}V_{1}} \\{//V_{pred}//*//V_{1}//}\end{matrix}$

$\text{s}_{22} = \cos\_\text{sim\_2}\mspace{6mu}\text{=}\mspace{6mu}\begin{matrix}{V_{pred}V_{2}} \\{//V_{pred}//*//V_{2}//}\end{matrix}$

$\text{s}_{2} = \cos\_\text{sim}\mspace{6mu}\text{=}\mspace{6mu}\begin{matrix}{V_{pred}V_{true}} \\{//V_{pred}//*//V_{true}//}\end{matrix}$

In step (10), the cross entropy loss function is:

$H\left( {\mspace{6mu} prob,y\mspace{6mu}} \right)\mspace{6mu} = - \underset{\text{i}}{\Sigma\text{y}}\left( {\mspace{6mu}\text{i}\mspace{6mu}} \right)\mspace{6mu}\text{log}\mspace{6mu}\left( {\mspace{6mu} prob\mspace{6mu}\left( {\mspace{6mu} i\mspace{6mu}} \right)\mspace{6mu}} \right)$

wherein, prob(i) and y(i) are both probability distribution functions,0≤i<K and i are integers, y ( i ) ∈{0,1} is a real probabilitydistribution and 0≤prob (i ) ≤1 is a probability distribution predictedby the model, Σy ( i ) =1, Σprob ( i ) =1 , and K refers to a total i inumber of categories, herein, K is 2; this cross entropy function isused to measure a difference between two distributions, the larger thevalue calculated by this formula, the greater the difference between thetwo distributions.

Preferably, the optimization algorithm having gradient descent is SGD orAdam.

The specific process of step (11) is:

-   for the learned rule embedding and each commodity pair, splicing and    inputting the rule embedding and the embedding of each attribute of    the commodity pair into the first network to obtain the importance    score of each attribute;-   if the score s₁ of the attribute is greater than the threshold    thres₁, then including the attribute in this rule;-   if the attribute is comprised in this rule, and the attribute values    of the two commodities under this attribute are the same,    calculating a probability p of taking “same” under this attribute;    if p is greater than the threshold thres₂, then taking the values    under this attribute as the same; if p is less than or equal to the    threshold thres₂, then calculating the similarity score s₂ of the    two commodities under this attribute; if s₂ is greater than the    threshold thres₃, then taking, by the rule, the attribute value    shared by the two commodities under this attribute;-   if the attribute is comprised in this rule, and the attribute values    of the two commodities under this attribute are not the same,-   then calculating the similarity scores s₁₁ and s₁₂, if both s11 and    s12 are greater than the threshold thres₃, then taking, by the rule,    the two attribute values of these two commodities under this    attribute.

Compared with the prior art, the present disclosure has the followingbeneficial effects:

The present invention integrates the learning of the rules into thetraining process of the model, and finally parses the learned ruleembeddings into rules. Based on the rules, the seller can know why thetwo commodities can be combined for sale, which can bring great benefitsfor e-commerce sales of commodities.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic flowchart of a combined commodity mining methodbased on knowledge graph rule embedding according to the presentinvention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The present invention will be further described in detail below withreference to the accompanying drawings and embodiments. It should bepointed out that the following embodiments are intended to facilitatethe understanding of the present invention, but do not have any limitingeffect on it.

As shown in FIG. 1, a combined commodity mining method based onknowledge graph rule embeddings, wherein the method comprises thefollowing steps:

S01. constructing a knowledge graph of commodities, wherein for eachternary group, a head entity is a commodity, a relation is a commodityattribute, a tail entity is a commodity attribute value. A task ofcombining commodities is defined as: given two commodities in thecommodity knowledge graph, and a plurality of attributes and attributevalues of each commodity, it is necessary to determine whether the twocommodities are a combined commodity. The innovation of the presentinvention is that rule learning is integrated into a model trainingprocess, so that a seller can be provided with interpretability throughlearned rules.

S02. expressing firstly each commodity, commodity attribute, commodityattribute value, and rule as an id, and then indexing each id to anembedding. For each sample, the inputted two commodities would have nattributes and attribute values, plus the inputted n rules, the presentinvention predicts whether the two commodities are a combined commoditybased on this method.

S03. calculating firstly a score of each attribute. Firstly, splicingand inputting the embedding of each rule and the embedding of eachcommodity attribute into a first neural network to obtain a importancescore s₁ of the attribute. A formula for each layer of the first neuralnetwork is:

$l_{11} = RELU\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {W_{11}concat\left( {r_{i},\mspace{6mu} p_{j}} \right)} \right)$

$l_{12} = RELU\begin{matrix}{f\left( {} \right)}\end{matrix}(W_{12}l_{11} + b_{12})$

$l_{13} = RELU\begin{matrix}{f\left( {} \right)}\end{matrix}(W_{13}l_{12} + b_{22})$

$s_{1} = sigmoid\begin{matrix}{f\left( {} \right)}\end{matrix}(W_{1L}l_{1(L - 1)} + b_{1(L - 1)})$

Specifically, by splicing and inputting the embedding of each rule andthe embedding of the commodity attribute into a fully connected layer,more and more high-level semantics are obtained, and finally theimportance score s₁ of the attribute under the rule can be predictedbased on the high-level semantics, a larger value means that theattribute is more likely to be included in this rule. Pre-setting athreshold thres₁, when the value of s₁ is greater than thres₁, then theattribute is included in this rule.

S04. then calculating a score of each attribute value. Splicing andinputting the embedding of each rule and the embedding of each commodityattribute into a second neural network to obtain a predicted embeddingof the attribute value. A formula for each layer of the second neuralnetwork is:

$\begin{array}{l}{\mspace{6mu}\mspace{6mu}\mspace{6mu}\mspace{6mu}\mspace{6mu}\mspace{6mu}\mspace{6mu}\mspace{6mu}\mspace{6mu}\begin{array}{l}

\end{array}} \\{l_{21} = RELU\begin{array}{l}{f\left( {} \right)}\end{array}(W_{21}concat(r_{i},\mspace{6mu} p_{j}))}\end{array}$

$l_{22} - RELU\begin{matrix}{f\left( {} \right)}\end{matrix}(W_{22}l_{21} + b_{22})$

$l_{23} = RELU\begin{array}{l}\begin{array}{l}{f\left( {} \right)}\end{array}\end{array}\left( {W_{23}l_{22} + b_{23}} \right)$

V_(pred) = W_(2L)l_(2(L − 1)) + b_(2(L − 1))

Specifically, the rules and attributes can be inputted into amulti-layer neural network, and finally the embedding of the attributevalue that should be taken under the attributes is predicted. Next,there are two cases. If the inputted attribute values under theattribute of the two commodities are the same, then a similarity betweenthe attribute value and the predicted attribute value can be calculated.The higher the similarity degree, the higher the score of the attributevalue. The method for calculating the similarity of the attribute valueis as follows:

$\text{s}_{2} = \text{cos\_sim}\,\text{=}\,\frac{V_{pred}V_{true}}{//V_{pred}//*//V_{true}//}$

Meanwhile, there is a possibility that under this rule, the value underthis attribute is “same”. At this time, splicing and inputting theembedding of each rule and the embedding of each commodity attributeinto a third neural network, so as to obtain a probability that thevalue under this attribute is “same”. The formula of the third neuralnetwork is:

$l_{31} = RELU\begin{array}{l}{f\left( {} \right)}\end{array}\left( {W_{31}concat\left( {r_{i},\, p_{j}} \right)} \right)$

$l_{32} = RELU\begin{array}{l}{f\left( {} \right)}\end{array}\left( {W_{32}l_{31} + b_{31}} \right)$

$l_{33} = RELU\boxed{f()}\left( {W_{33}l_{32} + b_{32}} \right)$

⋯

$p = sigmoid\boxed{f()}\left( {W_{3L}l_{3(L - 1)} + b_{3(L - 1)}} \right)$

If the inputted attribute values under the attribute of the twocommodities are different, then the degree of similarity between the twoattribute values and the predicted attribute values can be calculatedseparately, and then the two similarity scores can be combined tofinally obtain the score for the two attribute values. The method forcalculating the similarity of the attribute values is as follows:

$\text{s}_{21} = \cos\_\text{sim}\_ 1 = \frac{V_{pred}V_{1}}{\text{//}V_{pred}\text{//}*\text{//}V_{1}\text{//}}$

$\text{s}_{22} = \cos\_\text{sim\_2} = \frac{V_{pred}V_{2}}{\text{//}V_{pred}\text{//}*\text{//}V_{2}\text{//}}$

s₂ = 0.5 * (s₂₁ + s₂₂)

S05. next, solving a score of an attribute-attribute value pair. It canbe divided into three cases: when the score s₁ of the attribute is lessthan or equal to the preset threshold thres₁, then the score of theattribute value under the attribute should be 0; if the score s₁ of theattribute is greater than the preset threshold thres₁ and the attributevalues of the two commodities under this attribute are the same, thenthe score of this attribute-attribute value is

s₁ * (p+ (1 − p)  * s₂)

if the score s₁ of the attribute is greater than the preset thresholdthres₁ and the attribute values of the two commodities under thisattribute are different, then the score of this attribute-attributevalue is

0.5 * p* (s₂₁+  s₂₂)

S06, after obtaining the score of an attribute-attribute pair,calculating a score of a commodity pair under a certain rule, whereinthe calculation formula is:

$\text{score}_{\text{i}} = \,{\sum_{j = 1}^{m}{score_{\text{ij}}}}$

S07. after obtaining the score of a commodity pair under a certain rule,summing up the scores of the commodity pair under all the rules toobtain a final score of the commodity pair, wherein the calculationformula is:

$\text{score} = \left( {\sum_{i = 1}^{n}{score_{\text{i}}}} \right)\text{/n}$

S08. comparing the obtained score of the commodity pair with two labels0 or 1 indicating whether belonging to a combined commodity to obtain across entropy loss;

$H\left( {\, p,q} \right)\, = - {\sum\limits_{\text{x}}{\text{p}\,(\,\text{x}\,)\,\,\log\,\,(\, q\,\left( {\, x\,} \right)}}\,\,\,)$

This loss function is then optimized with the an Adam optimizer.

S09. after the rules are learned, parsing the rules, wherein the way ofparsing the rules is similar to that during the training. First, theembedding of the rule and the embedding of each possible attribute arespliced and inputted into the first network to obtain the importancescore of each attribute, and if the score s₁ of the attribute is greaterthan the threshold thres₁, then this attribute is included in this rule.Then, if the attribute is included in this rule, the value under therule should be calculated to be “same” or a specific value.

In this way, the combination commodity rules can be obtained. In thefinal application, there are mainly two ways:

-   The first way or method is as follows:-   Given a commodity pair, and the respective attributes and attribute    values of each commodity, inputting this information into the model,    you can get the probability score that two commodities in this    commodity pair can be combined into a combined commodity, if the    score is greater than 0.5, it is considered that these two    commodities is a combined commodity.-   The second way or method is as follows:-   Given a commodity pair, and respective attributes and attribute    values of each commodity. For all the rules generated by the present    invention, check one by one to see whether each attribute-attribute    value pair conforms to the current rule, and all attribute-attribute    value pairs conform to the current rule, then based on the current    rule, it can be determined that the two

commodities belong to a combined commodity. If none of the rules candetermine that the two commodities belong to a combined commodity, thenthe two commodities do not constitute a combined commodity.

Next, a specific example is used to illustrate the construction processof the present invention.

First, as shown in Table 1, it is a sample of the model input, whichcontains two commodities, each commodity contains a plurality ofattributes and attribute values, under each attribute, the attributevalues of the two commodities may or may not be the same.

Table 1 Commodity 1 Commodity 2 Brand Estee Lauder Estee LauderProduction Place Jiangsu Province Guangdong Province Effects WhiteningMoisturizing Series Whitening cream Essence lotion Whether being acombined commodity Yes

First, representing all attributes and attribute values of the twocommodities as embeddings. Then passing each attribute through the firstneural network to obtain the importance score of the attribute; theninputting the attribute value to the second neural network to obtain theattribute value score. Then, summing up the attribute andattribute-value scores to obtain the attribute-attribute value pairscore. Then, summing up the scores of all attribute-attribute valuepairs to obtain the score of the two commodities belonging to the samecommodity under this rule. Finally, summing up the scores of all therules for these two commodities, and finally obtaining the score thatthese two commodities belong to the same commodity.

During the testing phase, the rules need to be parsed. As shown in Table2, it is a rule parsed by the model based on the samples shown in Table1.

Table 2 Head Body Combination (Effects, whitening, moisturizing) &&(Brand, same)

The way of parsing the rule is similar to that in the training process.It also determines which attributes the rule contains, and thendetermines which attribute values should be contained under eachattribute, and finally the rule can be parsed.

The above-mentioned embodiments describe the technical solutions andbeneficial effects of the present invention in detail. It should beunderstood that the above-mentioned embodiments are only specificembodiments of the present invention and are not intended to limit thepresent invention. Any modifications, additions and equivalentreplacements made within the principle of the present invention shall beincluded within the protection scope of the present invention.

1. A combined commodity mining method based on knowledge graph ruleembeddings, comprising: (1) constructing a knowledge graph ofcommodities, wherein for each ternary group data in the knowledge graph,a head entity is a commodity I, a relation is a commodity attribute P, atail entity is a commodity attribute value V; (2) expressing thecommodity I, commodity attribute P, and commodity attribute value V asembeddings, respectively, and randomly initializing the embeddings of aplurality of rules; (3) splicing and inputting the embedding of eachrule and the embedding of each commodity attribute into a first neuralnetwork to obtain a importance scores₁ of the commodity attribute; (4)splicing and inputting the embedding of each rule and the embedding ofeach commodity attribute into a second neural network to obtain theembedding of the attribute value that the rule should obtain under thatattribute; V_(pred); (5) splicing and inputting the embedding of eachrule and the embedding of each commodity attribute into a third neuralnetwork, and calculating a probability score p of the same attributevalue of a certain attribute under a certain rule; (6) if attributevalues of two commodities under a certain attribute are different,calculating a similarity score S₂₁ of V_(pred) and V₁, and a similarityscore S₂₂ of V_(pred) and V₂; and if the attribute values of twocommodities under the certain attribute are the same, calculating asimilarity score S₂ of the V_(pred) and V_(true); wherein, V₁ representsan embedding of an attribute value of one of the two commodities underthe attribute, V₂ is an embedding of an attribute value of anothercommodity under the attribute, V_(ture) is an embedding of the sameattribute value; (7) when an importance score s₁ of a certain attributeis greater than a threshold thres₁, and the attribute values of the twocommodities are the same under the certain attribute, summing up toobtain a score_(ij) of this attribute-attribute value pair as s₁×(p+(1-p)×s₂); when the importance score s₁ of a certain attribute is greaterthan the thres₁, and the attribute values of the two commodities underthe certain attribute are different, summing up to obtain the score_(ij)of this attribute-attribute value pair as 0.5×s₁×(s₁×(s₂₁+ s₂₂); whenthe importance score s₁ of a certain attribute is less than or equal tothe thres₁, taking the score of this attribute-attribute value pair as0; (8) summing up the scores “score_(ij)” of m attribute-attribute valuepairs of a commodity pair to obtain score_(i):$\text{score}_{\text{i}}\text{=}\,{\sum_{j\text{=1}}^{m}{score_{\text{ij}}}}$(9) summing up the scores “score_(i)” of the commodity pair under nrules, and obtaining a final score “score” of the commodity pair:$\text{score=}\left( {\sum_{i\text{=1}}^{n}{score_{\text{i}}}} \right)\text{/n}$(10) comparing the obtained score of the commodity pair with two labels0 or 1 indicating whether belonging to a combined commodity to obtain across entropy loss; iteratively solving based on an optimizationalgorithm having gradient descent until a loss value converges andparameters of the three neural networks are trained, and obtaining theembeddings that have learned the rules at the same time; and (11) forthe embeddings that have learned the rules, utilizing the trained neuralnetwork for analysis to obtain the rules of commodity combination. 2.The combined commodity mining method based on knowledge graph ruleembedding according to claim 1, wherein, in step (2), the commodity I,the commodity attribute P, the commodity attribute value V and aplurality of rules are respectively numbered as ids, and then each ofthe ids constitutes a one-hot vector, and then the one-hot vector ismapped into an embedding, which is continuously optimized with a modeltraining process.
 3. The combined commodity mining method based onknowledge graph rule embedding according to claim 1, wherein, in steps(3) to (5), in the three neural networks, a calculation formula of anactivation function of each layer of neurons is: $RELU\begin{matrix}{f\left( {} \right)}\end{matrix}(x)\text{=}max\left( {0,x} \right)$ wherein the function ofRELU judges the value of each element in this matrix in turn, and if thevalue of the element is greater than 0, then the value is kept,otherwise the value is set to be
 0. 4. The combined commodity miningmethod based on knowledge graph rule embedding according to claim 1,wherein, in steps (3) to (5), in the three neural networks, acalculation formula of each layer of each neural network is:$l_{1} = RELU\begin{matrix}{f\left( {} \right)}\end{matrix}\mspace{6mu}\left( {W_{1}concat\left( {r_{i},\mspace{6mu} p_{j}} \right)} \right)$$l_{2} = RELU\mspace{6mu}\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {W_{2}l_{1} + b_{1}} \right)$$l_{3} = RELU\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {W_{3}l_{2} + b_{2}} \right)$ $\begin{matrix}\cdots \\{l_{L} = sigmoid\begin{matrix}{f\left( {} \right)}\end{matrix}\left( {W_{L}l_{L - 1} + b_{L - 1}} \right)}\end{matrix}$ wherein, W₁ W₂,...,W_(L); b₁ b₂,...,b_(L)are allparameters that need to be learned, W₁, W₂ , W₃, ..., W_(L)are matriceshaving sizes dim_(emb)*dim₁, dim₁*dim₂, dim₂*dim₃,...,dim_(L-1)*dim_(L)respectively and being random initialized; b₁,b₂,...,b_(L)is a randomlyinitialized vector of size dim₁ dim₂, dim₃,...,dim_(L), L is the numberof layers of the neural network; a nonlinear activation functionsigmoid(z)= $\begin{matrix}{\, 1\,}\end{matrix}$ 1+e ⁻ ^(z,) the output value is limited to (0 , 1)interval.
 5. The combined commodity mining method based on knowledgegraph rule embedding according to claim 1, wherein, in step (6), thesimilarity scores s₂₁, s₂₂ and s₂ are all calculated by cosinesimilarity, and the specific formulas are:$\text{s}_{21} = \cos\_\text{sim}\_ 1 = \frac{V_{pred}V_{1}}{\left\| V_{pred} \right\|*\left\| V_{1} \right\|};$$\text{s}_{22} = \cos\_\text{sim}\_ 2 = \frac{V_{pred}V_{2}}{\left\| V_{pred} \right\|*\left\| V_{2} \right\|};$$\text{s}_{2} = \cos\_\text{sim=}\frac{v_{pred}V_{true}}{//V_{pred}//*//V_{true}//}\,\,.$.
 6. The combined commodity mining method based on knowledge graph ruleembedding according to claim 1, wherein, in step (10), the cross entropyloss function is:$H\,\left( {\, prob,y} \right)\,\, = - {\sum\limits_{i}y}\,\left( {\, i\,} \right)\,\,\log\,\,\left( {\,\, prob\,\,\,\left( {\,\, i\,\,} \right)\,\,\,\,} \right)$wherein, prob(i) and y(i) are both probability distribution functions,0≤i<K and i are integers, y (i) ∈{0, 1} is a real probabilitydistribution and 0≤prob (i) ≤1 is a probability distribution predictedby the model,${\sum\limits_{\text{i}}y}(i) = 1,{\sum\limits_{\text{i}}{prob}}(i) = 1,$and K refers to a total number of categories, herein, K is 2; this crossentropy function is used to measure a difference between twodistributions, the larger the value calculated by this formula, thegreater the difference between the two distributions.
 7. The combinedcommodity mining method based on knowledge graph rule embeddingaccording to claim 1, wherein, in step (10), the optimization algorithmhaving the gradient descent is SGD or Adam.
 8. The combined commoditymining method based on knowledge graph rule embedding according to claim1, wherein, the specific process of step (11) is: for the learned ruleembedding and each commodity pair, splicing and inputting the ruleembedding and the embedding of each attribute of the commodity pair intothe first network to obtain the importance score of each attribute; ifthe importance score s₁ of the attribute is greater than the thresholdthres₁, then including the attribute in this rule; if the attribute iscomprised in this rule, and the attribute values of the two commoditiesunder this attribute are the same, calculating a probability p of taking“same” under this attribute; if p is greater than the threshold thres₂,then taking the values under this attribute as the same; if p is lessthan or equal to the threshold thres₂, then calculating the similarityscore s₂ of the two commodities under this attribute; if s₂ is greaterthan the threshold thres₃, then taking, by the rule, the attribute valueshared by the two commodities under this attribute; if the attribute iscomprised in this rule, and the attribute values of the two commoditiesunder this attribute are not the same, then calculating the similarityscores s₁₁ and s₁₂, if both s11 and s12 are greater than the thresholdthres₃, then taking, by the rule, the two attribute values of these twocommodities under this attribute.